-
Notifications
You must be signed in to change notification settings - Fork 114
Allow combining #[cache] with arguments
#417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
peg-macros/translate.rs
Outdated
| _ => true, | ||
| }) | ||
| .collect(), | ||
| RuleParamTy::Rule(ty) => ty.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work? This is for a parameter declared as rule<T>, which at the Rust level is a closure returning RuleResult<T>. To work properly, the cache key would need to be the identity of the closure definition site plus any closed-over upvars, which would be quite difficult to do. So this should probably remain forbidden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right, I improperly assumed that prohibiting generic types was sufficient for this case because I almost never used rule with a concrete type. Now that you point it out, it is embarrassingly obvious that RuleParamTy appearing here actually means I needed to think about it more :-)
peg-macros/translate.rs
Outdated
| RuleParamTy::Rust(ty) => ty | ||
| .clone() | ||
| .into_iter() | ||
| .filter(|tt| match tt { | ||
| // TODO: This needs to do more; at the least, there | ||
| // may be lifetimes. Doing this correctly for every | ||
| // edge case probably requires syn, which is a heavy | ||
| // dependency. | ||
| TokenTree::Punct(punct) => punct.as_char() != '&', | ||
| _ => true, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides lifetimes, this is going to break for things like &str. I don't think trying to turn a borrowed type into an owned type syntactically is going to be robust enough, even using syn.
The right way would be to use traits to get rustc to do it for you in the type system. ToOwned is close, but is implemented for T rather than &T. It might be hard to handle that because &T implements Clone so there couldn't be a T: Clone blanket impl and a separate one for references.
The simplest and most consistent way would be to just require Clone + Hash + 'static. If you're ok with this cloning the argument implicitly to use as the cache key, it seems like it should also be fine to pass by value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are saying makes sense.
I approached this largely pragmatically where a move away from “you can’t do this at all” was a step in the right direction, rather than doing the absolute best engineering. (It feels bad to say that when contributing to someone else’s project…) So I considered it OK to just have a compilation error in situations like what you describe with &str, though you make a great point that using <T as ToOwned>::Owned makes more sense—I always forget about this. (AFAIK it would still require stripping the reference in a hacky way like this, though?)
The issue of value versus reference types in cases similar to this is one which has come up in another crate I maintain and I never really came up with a good solution for what to do in the absence of type metaprogramming, and so I am not sure if there is any general solution for this case either.
I think the main issue with requiring cacheable arguments to be passed by value is that cloning is only necessary when inserting to the cache. For rules where there is a cache hit, this would just be wasted work. Additionally, in my case, requiring to pass by value would also require adding Rc, whereas right now it is not needed. These are not dealbreakers but it does feel bad to artificially restrict performance in a feature intended to improve performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed the part where you mentioned interior mutability 😱. In that case, Rc doesn't even save you because then you'd be mutating the copy used as the cache HashMap key, which it's not going to like.
I'm kind of inclined to say that rule arguments should be simple, immutable, maybe even Copy types and anything beyond that is asking for trouble. For example, I think pub rules already require arguments to be Copy when they re-parse for error position.
Would it work for your use case to have an escape hatch attribute, maybe #[cache_ignore] that excludes an argument from the cache key but passes it into the rule anyway? You would pass your context twice, once immutable by value for the cache key, and once as a mutable reference and #[cache_ignored] for the rule to mutate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, Rc doesn't even save you because then you'd be mutating the copy used as the cache HashMap key, which it's not going to like.
Since there are no blanket implementations of Hash for interior-mutable types, I have to implement that trait myself, so I think this is no issue, as the Rc implementation is just a pass-through, and it really would just be the performance loss?
Would it work for your use case to have an escape hatch attribute, maybe #[cache_ignore] that excludes an argument from the cache key but passes it into the rule anyway?
I am not sure, I think I would have to run some tests, or wait for a day when I can think better :-). The context would actually be Copy except for these damnable interior-mutable properties :-).
To explain more what is going on here in the first place: The original Wikitext parser did not backtrack for mismatched pairs; once it saw [[ {{ it would only ever look for }}. This means the obvious rule for parsing templates…
rule template()
= "{{" t:template_content() "}}" { Token::Template(t) }
/ "{{" { Token::Text }
…does not work, because for the input "[[ {{ ]]" it would produce Link { content: Text("{{") } while the “correct” production is actually Text("[[ {{ ]]").
To address this, whenever a mismatched pair is encountered, the fallback subexpression (rule broken_<whatever>) consumes the opening delimiter and then clears ctx.prod_kind. This causes whichever terminator had been set by the parent rule to no longer match in inline_breaks (the rule which decides whether an expression is finished), which causes the child to consume it instead of stopping at the terminator, which makes the parent rule fail and match its fallback subexpression, and on and on, until all the previous unclosed delimiters are consumed as text.
So, again using the example "[[ {{ ]]", the order of operation becomes:
| rule a | rule b | rule c | result | |
|---|---|---|---|---|
| attempt 1 | wikilink | template | text "]]" | ERROR (no "}}") |
ctx.prod_kind: |
Some(Link) |
Some(Template) |
||
| attempt 2 | wikilink | broken_template | text "{{ ]]" | ERROR (no "]]") |
ctx.prod_kind: |
Some(Link) |
None |
||
| attempt 3 | broken_wikilink | broken_template | text "[[ {{ ]]" | OK |
ctx.prod_kind: |
None | None |
And the rule looks (approximately) like:
rule template(ctx: &Context)
= t:(
ctx:({ ctx.with_prod_kind(Some(ProdKind::Template)) })
"{{" t:template_content(&ctx) inline_breaks(&ctx) "}}" { Token::Template(t) }
/ /* broken_template */ "{{" { ctx.prod_kind.set(None); Token::Text }
)
{ t }
It would be great to find another way to handle this but this is what the upstream developers came up with in their own PEG which I have adapted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest and most consistent way would be to just require
Clone + Hash + 'static.
It seems to me, that most idiomatic way is to introduce Cacheable trait, implement it for types, that you known is safe to cache (maybe via blanket impl), and cache them using methods on that trait.
ea024a6 to
9398558
Compare
This is a naïve first attempt at enabling caching with arguments, supporting both values and ownable references. References are less efficient because it is extremely difficult to satisfy the constraint `K: Borrow<Q>` for `HashMap` with a tuple. Most recommendations seem to be to either do some contortions with trait objects, or to use the hashbrown crate and implement `hashbrown::Equivalent`, but I have not been able to successfully think through how to do this in a way which works with arbitrary tuples of foreign types. Still, a half-measure which can be improved later is hopefully better than nothing.
9398558 to
68a9108
Compare
|
I’ve pushed an update which:
I stared into the void for a few hours to see if I could figure out a way to avoid cloning reference arguments for I also notice now that I misread your earlier question about “what about passing it in twice?”—apparently my brain instead substituted “what about always only ever allowing values?”. That would certainly work, though I am struggling to think about where the benefit comes from doing that, are you able to say more about it? My apologies! |
This is based on the commits in #416, because I did not want to deal with slow tests :-)
I was adapting a parser for a format (wikitext) which has pathological backtracking behaviour which demands caching to avoid quadratic parsing, but also the format is context-sensitive so rules need to receive arguments in order to know which tokens are terminators or not according to the enclosing context.
This patch adds support for combining
#[cache]with arguments for the specific use case I had, which is passing around a context object with interior mutability ( 😱 ) by reference.If you have separate suggestions about how to deal with such an unpleasant format, I am all ears :-)
Thanks, and after years, I still really enjoy working with rust-peg every time I have a need for a non-trivial parser!